Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[example] add a deep residual net example#1041

Closed
shuokay wants to merge 2 commits intoapache:masterfrom
shuokay:master
Closed

[example] add a deep residual net example#1041
shuokay wants to merge 2 commits intoapache:masterfrom
shuokay:master

Conversation

@shuokay
Copy link
Copy Markdown
Contributor

@shuokay shuokay commented Dec 23, 2015

Hi, as mentioned in #1022 , I have complete a simple residual net. But because I don't have enough gpus temporarily, I just test it on tinyimagenet. Hoping this can help you guys to brew your own residual net.

@piiswrong @antinucleon @winstywang please review this PR.

@winstywang
Copy link
Copy Markdown
Contributor

Please verify the results on cifar first.

@mli
Copy link
Copy Markdown
Contributor

mli commented Dec 24, 2015

many thanks for the PR. it looks like you are using a simplified resnet? how about refactor the symbol definition into examples/image-classification/symbol_resnet-small.py, then we can train it on cifar10 by

python train_cifar10.py --network resnet-small ...

ps. you may want to choose a better name but i hope to reserve symbol_resnet.py to the one can repeat their paper's results on imagenet

@winstywang
Copy link
Copy Markdown
Contributor

What is your results on cifar10? If it is the same as the paper, we can merge in, otherwise we had better to find the reasons to avoid confusion to others.

@futurely
Copy link
Copy Markdown

The result in https://github.com/shuokay/mxnet/commit/cd28e159d2509bf8799ad36a3219ecd5ac1b0a4f is "train accuracy around 82.7% and test accuracy 75.6%".

@mli
Copy link
Copy Markdown
Contributor

mli commented Dec 25, 2015

our baseline algorithm on cifar10 gets 90% test accuracy, see https://github.com/dmlc/mxnet/tree/master/example/image-classification#cifar-10

so i think there should be room for improvement

@shuokay
Copy link
Copy Markdown
Contributor Author

shuokay commented Dec 25, 2015

@winstywang @mli Sorry for disturbing you guys, I am busy with work these days, maybe can get an improved result this weekend.

@shuokay
Copy link
Copy Markdown
Contributor Author

shuokay commented Dec 25, 2015

I pushed this commit just for setting a checkpoint and will not push commits until get a reasonable result.

@mli
Copy link
Copy Markdown
Contributor

mli commented Dec 26, 2015

@shuokay no at all, many thanks for your contributions, we are just trying to help to improve the results

@wangg12
Copy link
Copy Markdown
Contributor

wangg12 commented Dec 27, 2015

@shuokay In the paper, I don't see a relu before element-wise addition. But in your implementation, there is one. Did I misread that?

@shuokay
Copy link
Copy Markdown
Contributor Author

shuokay commented Dec 29, 2015

Update the resnet-small example, there are differences to the paper:

  • 1*1 convolution operators are used for increasing dimensions.
  • This is a small residual net consists of 52 layers.
  • Data augmentations are using mxnet default options include center crop (instead of random crop) and random mirror, no paddings on raw image data and the input image size is 28×28(instead of 32×32).
  • Hyper parameters, this code is using initial lr=0.01, wd=0.00001.

Running this example by
python train_cifar10.py --network=resnet-small --lr=0.01 --lr-factor=0.1 --lr-factor-epoch=20 --num-epochs=30
I get a final train accuracy about 91.75% and test accuracy about 84.30%.

ps: I think magnitude should be 2.0 when using kaiming weight init method, because Var[w_l]==2/n_l, however when I set magnitude=2, the net don't converge.

@shuokay
Copy link
Copy Markdown
Contributor Author

shuokay commented Dec 29, 2015

@wangg12 you are right, I have updated the example code.

@antinucleon
Copy link
Copy Markdown
Contributor

Close for now. Welcome to PR when you get it work.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants